22 research outputs found

    Faster Approximate String Matching for Short Patterns

    Full text link
    We study the classical approximate string matching problem, that is, given strings PP and QQ and an error threshold kk, find all ending positions of substrings of QQ whose edit distance to PP is at most kk. Let PP and QQ have lengths mm and nn, respectively. On a standard unit-cost word RAM with word size wlognw \geq \log n we present an algorithm using time O(nkmin(log2mlogn,log2mlogww)+n) O(nk \cdot \min(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}) + n) When PP is short, namely, m=2o(logn)m = 2^{o(\sqrt{\log n})} or m=2o(w/logw)m = 2^{o(\sqrt{w/\log w})} this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.Comment: To appear in Theory of Computing System

    An efficient algorithm for generating super condensed neighborhoods

    No full text
    Abstract. Indexing methods for the approximate string matching problem spend a considerable effort generating condensed neighborhoods. Here, we point out that condensed neighborhoods are not a minimal representation of a pattern neighborhood. We show that we can restrict our attention to super condensed neighborhoods which are minimal. We then present an algorithm for generating Super Condensed Neighborhoods. The algorithm runs in O(m⌈m/w⌉s), where m is the pattern size, s is the size of the super condensed neighborhood and w the size of the processor word. Previous algorithms took O(m⌈m/w⌉c) time, where c is the size of the condensed neighborhood. We further improve this algorithm by using Bit-Parallelism and Increased Bit-Parallelism techniques. Our experimental results show that the resulting algorithm is very fast.

    New Bit-Parallel Indel-Distance Algorithm

    No full text

    Dynamic Edit Distance Table under a General Weighted Cost Function

    No full text

    A Practical Index for Genome Searching

    No full text
    Current search tools for computational biology trade e#- ciency for precision, losing many relevant matches. We push in the direction of obtaining maximum e#ciency from an indexing scheme that does not lose any relevant match. We show that it is feasible to search the human genome e#ciently on an average desktop computer
    corecore